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Abstract 

In this paper we derive the consistency of the penalized likelihood method for 
the number state of the hidden Markov chain in autoregressive models with Markov 
regimen. Using a SAEM type algorithm to estimate the models parameters. We 
test the null hypothesis of hidden Markov Model against an autoregressive process 
with Markov regime. 

Keywords: Autoregressive process, hidden Markov, switching, SAEM algorithm, 
penalized likelihood. 

1 Introduction 

This paper is devoted to estimate of autoregressive models with Markov regime. Our 
goals in this paper are: 

• Estimate, using maximum likelihood estimation (MLE) methods, the parameters 
that define the functions, the transition probabilities of the hidden Markov chain 
and the noise variance, computed via SAEM, a stochastic version of EM algorithm 
jH], for a pre- fixed number states of the hidden Markov chain. 

• Test the null hypothesis of HMMs against AR-RM. 

• Derive the consistency of the penalized likelihood method for the number of state. 

An autoregressive model with Markov regime (AR-MR) is a discrete-time process 
denned by: 

Yn = fx n {Yn-i,0x n )+<re n (1.1) 
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where {X n } is a Markov chain with finite state space {1, . . . , m}. The transition proba- 
bilities denoted as = P(X n = j|X n _i = i). The form an m x m transition matrix 
A. The functions {fi, ■ ■ ■ , f m } belong to a parameterized family 

{e 1 (-) + 6 Q :6 = {6 1 ,6 )ee} (1.2) 

where 9 a compact subset of R 2 , and {e n } is a sequence of independent identically dis- 
tributed standard normal random variables, VV(0, 1). As process {X n } is not observable 
then we are forced to work with simulations of the law of the hidden chain and to rely on 
observed data {Y n } for any inference task. 

The usage of Markov regime offers possibilities for modelling time series "subject to dis- 
crete shifts in regime- episodes across which the dynamic behavior of the series is markedly 
different' , as noted by Hamilton J7| who used a model AR-RM in the context of econo- 
metrics, for the analysis of the U.S. annual GNP (gross national product) series, with two 
regimes: contraction and expansion. Linear autoregressive process with Markov regime 
are also used in several electrical engineering areas including tracking of manoeuvring 
targets, failure detection and stochastic adaptative control (Douc et alii [TU]). 

An important class of AR-MR is the hidden Markov models (HMMs) for which the 
functions {f%, . . . , f m } are constants (9%^ = 0, for all i G {1, . . . ,m}). The HMMs are 
used in many different areas: basic and applied sciences, industry, economics, finance, 
images reconstruction, speech recognition, tomography, inverse problem, etc. [3], |22j . 

The advantage of using the SAEM algorithm is easiness of movement in different 
modal areas, that reduces the chance of the estimate to avoid a local maximum. The 
particularities of our problem allows us to do an exact simulation of the distribution of 
the hidden chain conditional to the observations, using Carter-Kohn algorithm 

For the hypothesis test of HMM against Linear AR-RM we follow the ideas of Giudici 
et al [TU] then we obtain the usual asymptotical theory. They used likelihood-ratio test 
for HMMs, to establish that the standard asymptotic theory rests valid. They work with 
hidden graphical Gaussian models. 

When the number m is unknown, the hypothesis test with likelihood ratio techniques 
fails to estimate m because regularity hypothesis are not satisfied. Particulary, the model 
is not identifiable, in the sense of Dachuna-Duflo [Zj (227), so standard \ 2 can n °t be 
applied. 

In the HMM framework, we distinguish two cases according if the number state of 
the observed variables is finite or not. In the finite case, Finesso ^2] gives rh a strong 
consistence penalized estimator of m, assuming that m belongs to a bounded subset of 
the integers numbers. Liu and Narayan (2JJ, also assume this bounded condition and 
postules a strongly consistent and efficient rh with the probability of underestimation 
decaying exponentially fast w.r.t. N, while the probability of overestimation does not 
exceed cN 3 . Gassiat and Boucheron prove the strong consistence of a penalized rh 
without assumptions about upper bounds for m, with the probability of underestimation 
and overestimation decaying exponentially fast. In the non- finite case, the likelihood ratio 
is not bounded, Gassiat and Keribin studies in ^5] show divergence to infinity. As far 
as we know, the divergence rate rests unknown. In Gassiat results over penalized 
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likelihood are given in order to obtain weak consistence for the estimator of the number 
state. We obtain strong consistence for a penalized m in a linear AR-MR, and m in a 
bounded set. 

The paper is organized as follows. Main assumptions are given in Section 2. In Section 
3 for a fixed number state of the hidden Markov chain, an SAEM type algorithm is used 
to estimate the parameters and is present the method of simulation of the hidden Markov 
chain and their convergence properties. In the Section 4 we presents our results on the 
analysis of LR test. In Section 5 we derive the consistency of the penalized likelihood 
method for a number state problem. For sake the clarity the proof of the Lemma 1 is 
relegated to Appendix A. Appendix B is devoted to simulations. 



2 Notation and assumptions 

Let yi,N = (yi, . . . , yjv) denote the observations and X 1:N = (X 1? . . . , X N ) the associated 
vector of the hidden variables. Using p as generic symbol for densities and distributions, 
the likelihood function is given by 

P(yi:N\yo, VO = P(VkN,Xl:N = x\y Q ,lJ>), (2.1) 

xe{l,...,m} N 

where x = (x 1 ,...,x N ) and ip = (A, 6, a) G * and = [0, l]" 1 ' x 6 m x (R+). The 
maximum likelihood estimate (MLE) ip is defined as, 

■0 = axgmaxp(yi : jv|yo, V0- 
Suppose that Y , {X n } and S\ are mutually independent then 

p(y 

n \X n , • • • , Xq, y n —i, . . . , , 

yo) = p(y n\Xni Z/n— 1 

). (2.2) 

Using ()2.2j) and from Markov property of {X n } we have 
l N (ip) = log p(yi: N 12/0,-0) 

= log I ^2 P(ykN,Xl:N = ^l:JvK 
\xe{l,...,m} N 



:N 



log I Y2 P(yi:N\Xl:N = XkNtVQ, 1p)p(X l:N = X h 

\x 1:N e{l,...,m} N 

/ m m N N—1 

log X! • • • X] II^ n l Xn = X ^yn-i) n ax n ,x n+1 p(Ai = Xi) ) (2.3) 



with 



Vi'i=l x m =l n=l n=l 



( \ Y -\ 1 ( (yn~ fi(yn-l,0i)) 2 

p{y n \yn~i,X n = i) = — ==exp 



^2^2 V 2a 2 
For the consistence the MLE we will assume the followings conditions, 
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(CI) The transition probability A is positive, this is, a y - > 5, for all i,j G {1, . . . , m} for 
some 5 > 0. 

This condition implies that there is an unique invariant distribution /i = (/ii, . . . ,fi m ). 
(C2) Let E"ilog|^i,ik<0. 

This condition, and the existence of the moments the e%, implies that the chain extended 
{(Y n ,X n )} is a geometrically ergodic Markov chain on the state space R x {1, . . . ,m} 
under t/> (see Yao and Attali [24J). 

(C3) Let 6+ := sup^ sup^^yilyo, i) < °° and E(| log6_(y x , y )\) < oo, where y ) := 

inf ^ E2=iP(yi|yo,i)- 

(C4) For all i, j G {1, . . . , m} and all y, y' G R, the functions tj) — >■ a^- and ^ — > p(y'|y, i) 
are continuous. 

(C5) The model is identifiable in the sense that p^ = p^* implies that ip = ip*. For this 
is sufficient that 9i ^ 9j if % ^ j, up to an index permutation (Krisnamurthy and 
Yin 

(C6) For all i, j G {1, . . . , m} and y, y' G R, the functions ip a %j and i/> — > p(y'\y, i) are 
twice continuously differentiable over O = G \? : ^ — ^o| < 

(C7) Let us denote V for gradient operator and V 2 for Hessian matrix, 

(a) sup^ eG supj,- 1| V log ay || < oo and sup^sup^- ||V 2 loga y -|| < oo. 

(b) E^ (sup^gosupy || V logp(Y"i|y , z)|| 2 ) < ooandE^ (sup^ e0 sup ifi || V 2 logp(Yi|Y , i)||) 
< oo. 

(C8) (a) For all y, y' G R there exist an integrable function h y>y i : {1, . . . , m} — > R + such 
that sup^ g0 p(?/i|?/o,«) < Vv'C 4 ')- 

(b) For all y, y' G R there exists integrable functions aj y : R — > R + and a 2 ^ : R — > 
R+ such that ||Vlogp(y'|y, i)|| < a ii2/ (y') and || V 2 logp(y'|y, z) || < h x , y (y') for 
all ip E O. 

In the next proposition we collect some the results of Douc et alii [TU] that attains our 
work. 

Proposition 1 

i) Assuming (C1)-(C4)- Then 

lim sup \N-H{iIj) - H(i/>) \ , - a.s 

where H(ip) = E^ (logp(Y |>loo:-i, V'o))- 
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ii) Assuming (Cl)-C5). Then 



lim i/j n = i/j F^ - a.s, 



iii) Assuming (C1)-(C3) and (C6)-(C8) then, 

N^Vlm - /(Vo) P^o - a.s. 

iv) Assuming (C1)-(C8) and that the Fisher information matrix for {Y n }, 7(/0o) 
positive definite. Then 

N 1/2 (4> N - V>o) -> AT(0, /(Vo) -1 ) P^o - weakly. 

3 The estimation algorithm for fixed m 

Since the likelihood estimator ip is a solution the equation V^£(/0) = 0, and this equation 
do not has an analytic solution, then the maximization has to be performed numerically 
by considering m N terms in the equation (|2.3j) . This restricts the model to observations 
with limited size and few states. For HMMs models in a finite space state Baum et alii 
introduced a forward-backward algorithm as an early version of the EM algorithm. 
The EM algorithm was proposed by Dempster et alii |9. to maximize log-likelihood with 
missing data. It enables, with a recursive method, to change the problem of maximizing 
the log-likelihood into the problem of maximizing some functional of the completed the 
likelihood p(yi-.N, £Ci:i\r|V') of the model: 



TV 

n 

n=l \_i,j=l i=l 

where ILa(') denotes the function indicator over the set A and I^ xB (-,-) = 
Let us describe the t + 1 step of the algorithm. Set 

Q(4>,4> {t) ) = E(\ogp(Y 1:N ,X 1:N \^)\Y 1:N = y 1:N ,iP). 

Then Q is the expectation of the log-likelihood of the complete data conditioned to 
the observed data and the value of the parameter computed at the step t, ip^\ Then we 
have that Q(ip,ip^) equals to 

JV-1 m 

^E(I M -(X n ,X n+1 )|F 1 :N = yi:N,1p) log(Ojj) 

n=l i,j=l 

N-l m 

-.N = yi:N,4>) log p(y n+1 \y n ,i). (3.1) 

n=l i=l 
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The EM is a two steps algorithm: the E step and the M step. In the E stage compute 
QfyjipW) the expectation conditioned to the observed data and the current value of the 
parameter. 

In the M step choose, 

?/> (m) = argmaxQ(^, ?/> (t) ). 

The EM algorithm converges to a maximum-likelihood estimate for any initial value, when 
the complete data likelihood function is in the exponencial family and a differentiability 
condition is satisfied. 

In order to avoid local minima, we have used an stochastic approximation of the EM 
algorithm, the SAEM algorithm. Such algorithm has been developed by Celeux et alii. 
in [H! and and its convergence has been proved by Delyon et alii jS]. The EM 
algorithm is modified in the following way: the (E) step is split into a simulation step 
(ES) and stochastic approximation step (EA): 

ES Sample one realization x^) N of the missing data vector under p{xi : n\vi:n, ip^ 1 )- 

EA Update the current approximation of the EM intermediate quantity according to: 

Qt+i = Qt + lt (\ogp(y 1:N ,x^ N \tp') - Q t ^j 
where (jf) satisfies the condition: 

(RM) for all t G N, j t G [0, 1], £ t =i j t = 00 and ^Zi 7? < 00. 

3.1 ES step 

In this section we describe the simulating method used in the SAEM algorithm. For 
sampling under the conditional distribution, 

p{xv. N \yi:N^) = /w(yi|yo,aJi) • • ■ax N _ 1 x N p{yN\yN-i,x N )/p(y 1:N \i)), 

for any x 1:A r = (xi, ■ ■ ■ , x^) G {1, . . . , m} N , Carter and Kohn in give a method that 
constitutes a stochastic version of the forward-backward algorithm proposed by Baum et 
alii [T]. This follows by observing that p(a;i : jv|2/i:JVj ^) can be decomposed as, 

N-l 

p{x 1:N \y 1:N ,i)) =p{x N \y x ._ N ,ijj) Y[p( x n\xn+i,yv.N,ip)- 

n=l 

Provided that X n+ i is known, p(X n \X n+ i, yi-N, i>) is a discrete distribution, which suggests 
the following sampling strategy. For n = 2, . . . , N, i G {1, . . . , m}, compute and store 
recursively the optimal filter p(X n \yi :n ,ip) as 

p(X n = i\yi:n,ip) oc p(y n \y n -i,X n = i, tp) a i:j p(X n -i = j\yu n -i)- 

i=i 
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Then, sample Xn from p(Xn\i/i:n, VO an d f° r n — N — 1, . . . ,n, X n is sample from 
p(X n = i\X n+l = x n+1 , y 1:n , ip) 



a ijn+1 p(X n = i\y 1:n ,^) 



As a consequence, the estimation procedure generate an ergodic Markov chain {x^ N } 
on the finite state space {1, . . . , m} N , so that p(£i : jv|yi:jv, ^) is its stationary distribution. 
Ergodicity follow from irreducibility and aperiodicity, by observing the positivity of the 
kernel, this is, 

k ( x i-n\ x i~n ] ^) v&n^ivv.n) n^r/prfVi+i^^niv) > 0. 

In this case the standard ergodic result for finite Markov chains applies (for instance, 
Kemeny and Snell [18J), 

\\K(x% 1 \x%^)-p(X l , N \y l , N ^)\\ < Cp l -\ 

withC = card({l,...,m} N ), p = (1-2K*) y K* = inf K(x'\x, ip), forx,x' e {l,...,m} N . 

3.2 EA step 

The f)3.ip equation suggests us to substitute the step EA for approximations of Robins 
Monro (ver Duflo [TT]). s = (sf , S2 , S3 ), defined by: 

sf +1 \i,n) = s¥\i,n)+ lt (Mx n )-4\i,nj) (3.2) 

4 m) (0 = 4* ) (0+7t(iVi(a:i^)-a? ) (0) (3-3) 
(z,j) = S ( t )(i,j)+ Tt (a^^-s^'))- (3-4) 
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where Ni(x llN ) = Yln=i ^i( x n) and Nij(x 1:N ) = Y^n=i ^ij(x n , x n+1 ), are sufficient statis- 
tics for the chain of hidden Markov. 

When fj(y, 6j) = 9j, the maximization step is given by, 



a 



(t+i) _ s$ +1) (i,j) 



v „(*+!), 

1 > 

AT (t+1). 







(t+1) 



En=i4 (»,w)y r , 

,(*+!) ^ 



(7 



JV-l 

2 (<+1) = 1 V" „(«+!)/'.• „W„. _fl(*+lh2 

AT 



JV-l 
n=l 
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and for fj(y, Oj) = O^y + 6 0tj by, 



a 



(t+l)/- -X 

(t+l) _ S 3 (hJ) 



{t+l) 



N-l (t+l)/- \ (t+l)/- \ v^iV (t+l)/ 



En=l S l (*> n)ynVn-l ~ En=l S l (*> n K En=l S l (*, «)j/n-l 



J2n=i *i +1 \h n)yl_ x - (j2n=i s i +1 \h n)y n 

N-l N 

n=l n=l 

- N-l m 

— ^2^2 S l +1 \h n )(yn - fiiVn-lA)) 2 



N . 

n=l i=l 



We consider the observations y\-N fixed, the previous expressions define, in an explicit 
way, in each one of the two cases of study, the application ip = ip(s) between the sufficient 
statistics and the parameters space necessary to SAEM. 



3.3 Convergence 

The simulation procedure generates {x[} N }, a finite Markov chain. The hypotheses of 
Delyon et alii jH] that ensures the convergence of the SAEM algorithm are no more 
satisfied but in this case, we can be use the Theorem 1 of Kuhn and Lavielle in [2U] : 

Theorem 1 If we suppose the conditions that guarantee the convergence of the EM algo- 
rithm, the condition (RM) and the following hypothesis, 

SAEM1 The function p(yi:N\ip) and the function ip = ip(s) are I time differentiable. 

SAEM2 The function ip — > K$ = K(-\-,ip) is continuously differentiable on The transition 
probability generates a geometrically ergodic chain with invariant probability 
p(xi:N\yi:N, VO ■ The chain {x^ N } takes values a compact subset. The function s is 
bounded. 

Then, w.p 1, lim t _ >oa d{il)^ t \ C) = where C = {ijj E ^ : V^l(ip) = 0} is the set of 
stationary points. 

In our case the hypotheses of the theorem are verified, in fact, the hypothesis RM is 
satisfied choosing the sequence 74 = 1/t, SAEM1 is obtained because S\ is distributed 
normal and SAEM2 is consequence of the discussion made in £ 13.11 This guarantees the 
previous theorem and this give us the convergency. 
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4 Hypothesis test 

In this section we study the likelihood ratio test (LRT) for testing a model HMMs against 
a process AR-RM. We prove that the standard theory for LRT of a point null hypothesis 
is valid. Let ip — (A @i, #o, c 2 ) and tpo = (A, 0, #o, a 2 ), then the test we consider is that 

H : 0i = 

against 

H x : 0i^O. 

Theorem 2 Assume that (C1)-(C8) hold. Then, 

2(J($ - J(tfo)) - X 



2 

m i 



under P^ . 

Proof: Using the Taylor expansion of /(■?/>) around ip, 

Z(Vo) - Z(V0 = (^o - ^)V^(^) + ~(V>o - ^) t Vj/(^)(^o - ^) 
where $ = \ip + (1 - A)$, A G (0, 1). Also V^($) = 0. So 

-2(/(^ ) - = -[^ 1/2 (^o - ^) t ][iV- 1 V^(^)][iV 1 /2(^ _ 
Now, since ^ — >■ ^ ~ does ^jv, and using Proposition [H-(iii-iv), 
NV 2 $ N - Vo) -> AT(0, /(^o)" 1 ) P^o - ^eofeZy 

and 

So the proof is complete. 

The theorem says that we can employ the LRT test rejects H if: 

-2(z(^ ) - m) > xi, a 

where x m a is the a-quantile of the x m distribution. 



5 Penalized estimation of the number state 

In this section we presents a penalized likelihood method for selecting the number state 
m of the hidden Markov chain {X n }. For each value of m > 1, we consider the sets \l/ m 
and M. = U m >i ^mi the collection of all the different models. For a fixed m, we have 
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seen in Section 3 that it is possible to estimate the unknown parameters for the model. 
Hence, it is now possible evaluate the log-likelihood chosen model l(ip m )- 

As we assumed identifiability (C5), we have that true number state, m is minimal, 
that is, there does not exist a parameter ip m 6 \I/ m with m < m Q such that ip m and ip mo 
induce an identical law for {Y n } n > . We said that m n over-estimate the number state m 
if rriN > m and under-estimate the number state if ttin < rrio- 

The penalized maximum likelihood (PML) is defined as: 

C(N,m) = -logp(y 1:N \y , xx^iN)) +pen(N,m), 

where ift(N) is the maximum likelihood of ip e \l/ m based on N observations and pen(N, m) 
is a positive and increasing function of m. A number state estimation procedure is defined 
as follows: 

rh(N) = min{argminC(iV, m)}. 

m>l 

In the following theorem we prove that the estimator PML over-estimate the number 
state m . 

Theorem 3 Assume (C1)-(C5) and that lim A r^ 00 |jen(A^, m) = for all m then 

liminf m(N) > itlq. P^ — a.s. 

N~*oo 

Proof: From Proposition ^-(i) we have: 

$ E * mj and HW) ~ HWo) = E*, (log ^tZ') ) : = ^o,# 
Therefore for m < uiq: 

inf [/(^o) - m\ - Wo) ~ W) - inf D{^) > 0, 
D(i/jQ,ifj) > since mo in minimal. We have: 

lim l(tp mo ) - K^m) = D(ipo,4>) > 0. 
By the definition of C(N,m) and by assumption lim^ pen(N,m) = 0, 

lim C(N, m) - C(N, m ) = £>(V>o, V) > °> 

for any m < itlq. On the other hand C(N,rh(N)) — C(N,mo) < 0, by the definition of 
rh(N) and we conclude that 

liminf rh(N) > itlq. F^ — a.s 

N—+00 
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In the following we prove that the estimator PML under-estimate the number state. 
Let us define the distribution, 

Qm(yi:N\%l) = ^ P (i>)(p(yi:N\yo,xi,i>)), 

where p(ip) is a priori distribution on ^> m . In the following we will write the model in its 
vectorial form, 

y = Z9 + e, (5.1) 
where e = (ae u . . . , ae N ), y = y\. N , in the case AR-MR 9 = ((0 o ,i, 6*1,1), • • • , (#o,m, 0i,m))*, 



( (l,y )Ii(^) 
Z = j 

while in the case HMMs = (0i, . . . , 9 m ) 1 

Ii(xi) 



(l,y )Im(^l) \ 

(l,y iV _i)I m (a; A r) y 

Hm(^l) \ 



ii(xiv) ■ • • ^(xjv) y 

Given Xi,y , the likelihood function for the model ()5.1j) is, 
P(y\xi, 2/0,-0) = 



with, 



x 2:JV G{l,...,m}^-i 

p(y|X 2:7 V = X 2 :N,Xl, 1p)p{X 2 -.N = X 2:N \xt, 0) (5-2) 

ec 2:JV e{i,...,mp- 1 



p(2/|A 2:7V = x 2 :„,Xi,V) = M{y - Z9\0,a 2 I N ) 

P(X 2 -.N = X 2 :n\xi,i}>) = <htix, • • • dx N _ lXN - 



Suppose the following structure of dependence for the components ip, 



P y>) = [l[p(A i ))p(p\o*)p(o !i ) 



and suppose the following densities that are priors conjugated for likelihood function (|5.2jl : 
1. 

~ A/"(#|0,(x 2 E) = (2 7 r ( T 2 )- m/2 det(S)- 1/2 exp ^-^L^E"^ 
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2. For a 1 is proposed an inverted gamma XQ , 

v /2 

a> ~ W(V2, V2) = (^)- (W2+1) -P (-£) , 

r(u) = f™s u e-'ds. 

3. Ai ~ 25(e). 25 denotes a Dirichlet density with parameter vector e = (1/2,. ..,1/2), 



r(m/2) fr _ 1/2 



r(i/2)' . , 

The following Lemma gives a bound of the likelihood function normalized by Q m . 

Lemma 1 The prior distribution p(ip) satisfies for all m and all y G M N the following 
inequalities, 

log off < Miv) WA9 + logrff ) 

logdet(E) (1 + uo), / t n \ N logdet(M) , ^ f N + v 
+ 2 + 2 0J hg(u + y t Py)--- 2 V ; -lo g r' 

where M" 1 = Z l Z + IT 1 , P = I- ZM^ 1 and for N > A, 

/ r(m/2) m(m-l) 1 \ 
c m (iV) = -m (log^ - + m ) ■ 

Lemma 1 constitutes a basic step in the proof of the following proposition, 

Proposition 2 Let m the PML number state. Then for all m , all ip G ^ mo and all 
m > m .' 

El t N + v 

exp(7 + Apen(m ,m)) / (u + y Py)^~Q m (y\x 1 )dy 

m=m +l J{y} 

where A N pen(m 1 ,m 2 ) := pen(N,m,i) — pen(N,m 2 ), 



r = ^^iog W+ c„ w+ iogr(|) 

logdet(E) AT logdet(M) fN + v 

+ 2 Y 2 ° g 

Proof: by Lemma 1, 

log p(y|yo.*i>vo </ 

Qm(l/kl) 
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with 

/ = ^^l hg(N) + Cm(N)+logr q) 
+ log^a + (1+20 logK + ylpy) _ * _ isgdiM _ log r ( n + ,„ 

also, 

^max 

P(m(iV) > mo) = H™( N ) = m )> 

m=mo+l 

and therefore, 

P(m(iV) = m) 



< P logp(y|y , x^ifaj) < sup logp(y |y , Si, ^) + Apen{m , m) 

V V>eAl 

< P(logp(j/|y ,£i,V'o) < logQ m (j/|zi) + / + Apen(m ,m)) 

— ^(^{logp(y\y ,x 1 ,i(> )<logQ rn (y\xi)+I+Apen(rn ,rn)}) 

= / I{logp(y|j/o,xi,Vo)<logQm(y|xi)+/+Apen(m ,m)}(2/) exp \ogp(y\y , X 1: 1p)dy 



< / exp(log(5m(2/|a:i) + 1 + Apen(m ,m))dy. 
'{y} 



get: 



P(m > m ) < exp(I' + Apen(m ,m)) (u + y t Py)^r l Q m (y\x 1 )dy. 

m=mn+l 



m=mo+l 

■ 

As a consequence of this result and the first Borel-Cantelli Lemma, the convergence of m 
depends on the study of the series J2 N Ylm=m +i exp (/'(A/ - , m) + Apen(m , m)). 
In the following theorem we find under-estimate estimator of number state m . 

Theorem 4 If J^(u + y t Py)^2 Jl Q m (y\xi)dy < 00 and limjy^oo pen(N, m) — pen(N + 
1, m) = 0, then 

rh(N) < m c.s — F^ . 
Proof: Let us defined a N = I'(N,m) + Apen(m ,m). Observe that the serie 

M 



^""^ exp ajy < 00, 



m=mo+l AT 

converges as consequence of the ratio criterio and this shows that lirriN^ooCiN+i — < 0. 

Ill fcictj 
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i±|^<0, -log 



r((AT+l+«o)/2) i n 

r((iv+^ )/2) 1 J 



lim AN + ipen(mo, m) + Ajvpen(m , m) 
= lim pen(N + 1, mo) — pen(N + 1, m) — pen(N, mo) + pen(N, m) 

N-^oo 

< lim pen(N, m) — pen(N + 1, m) = 0. 

iV— >oo 



Then we have 

lim a N+1 — ax 

N—>oo 

m(m-l), fN + l\ , , . . l + log2 
= $!L ~2 l0g (— ) + + 1) - c„,(N) ^i- 

, /r((JV+ l + u )/2)\ . . 

- log I fT^y+^oT/^) J + AAr + 1 ^ era ( m 0' m ) ~ A N pen(m , m) < 

Thus J^^y P^ (m(iV) > mo) < oo and from the Borel-Cantelli lemma we conclude that 
F^ (rh(N) > m i.o) = 0. This is equivalent to say that rh(N) < m c.s-P^ . ■ 



One of the most common choices is pen(N, m) 



log(AQ 



dim(ty m ) (Bayesian information 



criteria, BIC). It is natural to use dim(^f m ) = m(m — 1) + mdim(&) + 1. 



A Proof of Lemma 1 



This would implies that, 

p(y\y ,xi,ij) 



The proof of this Lemma is obtained by showing the existence of constants Ci,C 2 such 
that: 

p{y\xx, tp, ) < CiQ m {y\xx, N ) (A.l) 
p(x 2:N \xi,i[)) < C 2 Q m (x 2:N \xi). (A.2) 

p(y\Xl:N,^)p(x 2 :N\Xl,1p) 
xe{l,...,m} N 

< CxC 2 22 Q™(y\x2:N)Qm(x 2 :N\Xl) 
x€{l,...,m} N 

= CiC 2 Q m (y\xi). 

and hence p(y\y , x u ip) < CiC 2 Q m (2/|xi). 

We proceed with the evaluation of Q m (x 2: ]y\xi) following the proof given in the ap- 
pendix of [21] • Let 



Qm{x 2 :N\Xl) = J f 



i=l 



T(m/2) 



n 



r(% + 1/2) 



iW + 1/2) \fi r(i/2) 
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and 



p(x 2 : N \Xl,1p) 
Qm(x2:N\Xl) ~ J"jm_ 



< 



T(m/2) 



T-rm r(jVij+l/2) \ 

lli=i r(i/2) / 



_r(ATi+i/2) ^iii=i r(i/2) y 
We have that and the right side the equation (|A.3|) does not exceed. 

T(JV + m/2)r(l/2) 
T(m/2)T(N + 1/2) 

In Gassiat and Boucheron ^3], is noted that, 



(A.3) 



m log 



r(AT + m/2)r(l/2) 



_T(m/2)T(N + 1/2) _ 
for AT > 4, c m (A^) is choosed as: 



m(m — 1) , „ T 



Tim 1 2) mim — 1) 1 \ 
-rra log ^ [J ^ TT — - + 1 



r(i/2) 



4N 



12N 



Then: 



Q m (x\xi) 



(A.4) 



To evaluate Q(y\xi : N) let us develop the expression, 
p(y\y , x 1:N , 6, a 2 )p(6\a 2 )p(a 2 ) = AT(y - Z0\0, a 2 I N )N(6\0, cx 2 £) XQ(a 2 \v Q /2, u /2) 

= (2na 2 )- N / 2 - 



exp 



2a 2 

(2na 2 )- m/2 det(£)~ 1/2 exp 

2 \ -{vo/2+X) 



y-ZB)\y-ZB) 
1 



2a 2 



u, 



v /2 



2^/ 2 T(v /2) 



CX ^-2^) 



The above-mentioned is equivalent to 



u v 0/2 (27ia 2 )- N / 2 (2na 2 )- m / 2 
2 v °/ 2 T(v /2) 



exp 



mYM 



m 



2a 2 



2 \-(v /2+l) 



exp 



[up + y f Py) 
2a 2 



with M 1 = Z l Z + E 1 , m = MZ l y and P = I — ZM t Z t . Integrating the last expression 
respect to 6* and then to a 2 we obtain 



N) 



u o/2 det(M) 1 / 2 r((AT + i;o)/2) 
^ a 2)7v/2 r ( Uo /2) det(S)V2( Mo + y t Py yN+v )/2- 



(A.5) 
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this given, 

p(y\yo,x u iP) < p(y\y , xu 4>)(7ia 2 ) N / 2 T(v /2) det^) 1 / 2 ^ + y^Py)^)/ 2 
Qm{y\xi; N ) - u v o/2 det{M) l / 2 r({N + v )/2) 

eX p (~^(y - Z§Y(y - Z9)\ (7ra 2 ) N / 2 V(v /2) det(S) 1 / 2 ( Mo + y*Py)^+^)/ 2 

(2na 2 ) N / 2 u o/2 det{M)V 2 r{{N + v )/2) 
exp (-^) T(v /2) det(S) 1 /2( Mo + y tp y ){N+v Q )/2 

2 N / 2 u o/2 det(My/ 2 T((N + v Q )/2) ' 
with this expression and the equation (|A.4|) we obtain lemma 1. ■ 



B Simulations 

In this section we apply our results to some simulated data. We work with an HMMs and 
two AR-RM. We use pen = l ° s ^ dim(^> m ) (BIC). We value the likelihood function for 
any set of parameters ip by computing 

m 

p(yi-.N\yoip) = ^2a N (i), 

8=1 

where a n (i) = p(yi :n ,X n = i) can be evaluated recursively with the following formulae 
forward of Baum, 

m 

a n{j) = S ^ot n -i(i)a ij p(y n \y n ^i, X n = i) 
i=i 

see D. Le Nhu et alii j2S]. 



B.l HMMs 

In the simulation of the HMMs we set the following parameters: dim(^> m ) = m 2 + 1 
AT = 500, m = 3, a 2 = 1.5, 9 = (-2, 1, 4), 

A = 




the observed serie is plotted in figure 1. 

The table 1 contains the values for the penalized maximum likelihood for m — 2, ... ,7, 
we observe that m — 3. In this case ip is estimated by using the SAEM, the values are, 
a 2 = 1.49, 9 = (-1.98, 4.09, 0.91), 

0.8650 0.0274 0.1076 
A = 1 0.0404 0.8943 0.0653 
0.0658 0.0648 0.8694 



16 



Tfl 






/( ij/i 1 1 iy~\ o nri 

— i>\Y ) \ pt-li 


2 


802.32 


15.53 


817.85 


3 


419.09 


31.07 


450.16 


4 


417.70 


52.82 


470.52 


5 


464.70 


80.78 


545.48 


6 


445.89 


114.97 


560.86 


7 


436.26 


155.36 


591.62 



Table 1: The values for the PML 




Figure 1: The observed serie yi, . . . , y 500 for the HMMs 



in the figure 2 displayed the sequence {ip^}, t = 1, . . . ,4000 and we observe the conver- 
gence of the estimate. 

B.2 AR-RM 

In the first simulation of the AR-RM we set the following parameters: dim(^/ m ) = m(m + 
1) + 1, N = 500, m = 2, a 2 = 1.5, 

/ 1 -1 \ _ ( 0.9 0.1 \ 
\ -0.5 0.5 J ' \ 0.1 0.9 J ' 

the observed serie is plotted in figure 3. 
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Figure 2: Convergence of the estimate of, a 2 , 9 and A. 



m 




pen 


+pen 


2 


351.14 


18.64 


369.78 


3 


346.64 


37.28 


383.92 


4 


355.10 


64.14 


417.24 


5 


354.52 


93.21 


447.73 


6 


384.50 


130.50 


515.00 



Table 2: The values for the PML 

The table 2 contains the values for the penalized maximum likelihood for m = 2, . . . , 6, 
we observe that m — 2. In this case ip is estimated by using the SAEM, the values are, 
a 2 = 1.42, 

1.07 -0.96 \ * _ / 0.8650 0.1350 
-0.5 0.5 J ~ \ 0.1130 0.8870 

in the figure 4 displayed the sequence {ip^}, t = 1, . . . , 1000 and we observe the conver- 
gence of the estimate. 

In the second simulation of the AR-RM we set the following parameters: N = 500, 
m = 2, a 2 = 1.5, 



.1 -2 \ / 0.9 0.1 

' -0.7 1.08 ) \ 0.1 0.9 



the observed serie is plotted in figure 3. 

In this case m = 2 is fixed and ip is estimated by using the SAEM, the values are, 
a 2 = 1.42, 



, 0.85 -2.01 \ j _ / 0.9093 0.0907 
1 ' -0.69 1.08 ) ~ [ 0.019 0.9181 



in the figure 6 displayed the sequence {ip^}, t = 1, . . . , 1000 and we observe the conver- 
gence of the estimate. 
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50 100 150 200 250 300 350 400 450 500 



Figure 3: The observed serie yx, . . . , 1/500 for the AR-MR 
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Figure 4: Convergence of the estimate of, Q\, 9 2 , cr 2 , and A. 
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